SMT Experiments for Romanian and German Using JRC-ACQUIS

نویسنده

  • Monica Gavrila
چکیده

One of the LT-applications that ensures the access to the information, in the user’s mother tongue, is machine translation (MT). Unfortunately less spoken languages a category in which the Balkan and Slavic languages can be included have to overcome a major gap in language resources, reference-systems and tools. In its simplest form, statistical machine translation (SMT) is based only on the existence of a big parallel corpus and therefore it seems to be a solution for these languages. In this paper the performance of a Moses-based SMT system, for Romanian and German, is investigated using test data from two different domains legislation (JRCACQUIS) and a manual of an electronic device. The obtained results are compared with the ones given by the Google on-line translation tool. An analysis of the obtained translation results gives an overview of the main challenges and sources of errors in translation, in these experimental settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Cross Language Term Equivalents Using Statistical Machine Translation and Distributional Association Measures

This article presents a comparison of the accuracy of a number of different approaches for identifying cross language term equivalents (translations). The methods investigated are on the one hand associative measures, commonly used in word-space models or in Information Retrieval and on the other hand a Statistical Machine Translation (SMT) approach. I have performed tests on six language pairs...

متن کامل

Training Data in Statistical Machine Translation - the More, the Better?

Current statistical machine translation (SMT) systems are stated to be dependent on the availability of a very large training data for producing the language and translation models. Unfortunately, large parallel corpora are available for a limited set of language pairs and for an even more limited set of domains. In this paper we investigate the behavior of an SMT system exposed to training dat...

متن کامل

Experiments with Small-sized Corpora in CBMT

There is no doubt that in the last couple of years corpus-based machine translation (CBMT) approaches have been in focus. Each of the approaches has its advantages and disadvantages. Therefore, hybrid approaches have been developed. This paper presents a comparative study of CBMT approaches, using three types of systems: a statistical MT (SMT) system, an example-based MT (EBMT) system and a hyb...

متن کامل

Experiments with Small-size Corpora in CBMT

There is no doubt that in the last couple of years corpus-based machine translation (CBMT) approaches have been in focus. Each of the approaches has its advantages and disadvantages. Therefore, hybrid approaches have been developed. This paper presents a comparative study of CBMT approaches, using three types of systems: a statistical MT (SMT) system, an example-based MT (EBMT) system and a hyb...

متن کامل

Experiments on Processing Overlapping Parallel Corpora

The number and sizes of parallel corpora keep growing, which makes it necessary to have automatic methods of processing them: combining, checking and improving corpora quality, etc. We here introduce a method which enables performing many of these by exploiting overlapping parallel corpora. The method finds the correspondence between sentence pairs in two corpora: first the corresponding langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010